test observation
Conditional neural control variates for variance reduction in Bayesian inverse problems
Bayesian inference for inverse problems involves computing expectations under posterior distributions -- e.g., posterior means, variances, or predictive quantities -- typically via Monte Carlo (MC) estimation. When the quantity of interest varies significantly under the posterior, accurate estimates demand many samples -- a cost often prohibitive for partial differential equation-constrained problems. To address this challenge, we introduce conditional neural control variates, a modular method that learns amortized control variates from joint model-data samples to reduce the variance of MC estimators. To scale to high-dimensional problems, we leverage Stein's identity to design an architecture based on an ensemble of hierarchical coupling layers with tractable Jacobian trace computation. Training requires: (i) samples from the joint distribution of unknown parameters and observed data; and (ii) the posterior score function, which can be computed from physics-based likelihood evaluations, neural operator surrogates, or learned generative models such as conditional normalizing flows. Once trained, the control variates generalize across observations without retraining. We validate our approach on stylized and partial differential equation-constrained Darcy flow inverse problems, demonstrating substantial variance reduction, even when the analytical score is replaced by a learned surrogate.
NeurIPS2021_emergent_group_communication (7).pdf
We generate 128,000 images as agents' observations using python's matplotlib library Hunter [2007] V ariational autoencoder [Kingma and Welling, 2014] is used to encode the observations. Input is flatted 30,720-dimensional vector (32 by 320 by 3). Both encoder and decoder have one hidden layer with the dimension size being 1,024. The output (communication message) is a 10-dimensional vector. ReLU is used as the activation function.
Precision of Individual Shapley Value Explanations
Shapley values are extensively used in explainable artificial intelligence (XAI) as a framework to explain predictions made by complex machine learning (ML) models. In this work, we focus on conditional Shapley values for predictive models fitted to tabular data and explain the prediction $f(\boldsymbol{x}^{*})$ for a single observation $\boldsymbol{x}^{*}$ at the time. Numerous Shapley value estimation methods have been proposed and empirically compared on an average basis in the XAI literature. However, less focus has been devoted to analyzing the precision of the Shapley value explanations on an individual basis. We extend our work in Olsen et al. (2023) by demonstrating and discussing that the explanations are systematically less precise for observations on the outer region of the training data distribution for all used estimation methods. This is expected from a statistical point of view, but to the best of our knowledge, it has not been systematically addressed in the Shapley value literature. This is crucial knowledge for Shapley values practitioners, who should be more careful in applying these observations' corresponding Shapley value explanations.
- Europe > United Kingdom > England > Greater London > London (0.04)
- Europe > Norway > Eastern Norway > Oslo (0.04)
K-Nearest Neighbors, Naive Bayes, and Decision Tree in 10 Minutes
Unlike linear models and SVM (see Part 1), some machine learning models are really complex to learn from their mathematical formulation. Fortunately, they can be understood by following a step-by-step process they execute on a small dummy dataset. This way, you can uncover machine learning models under the hood without the "math bottleneck". You will learn three more models in this story after Part 1: K-Nearest Neighbors (KNN), Naive Bayes, and Decision Tree. KNN is a non-generalizing machine learning model since it simply "remembers" all of its train data.
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.88)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.64)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.61)
Using Shapley Values and Variational Autoencoders to Explain Predictive Models with Dependent Mixed Features
Olsen, Lars Henry Berge, Glad, Ingrid Kristine, Jullum, Martin, Aas, Kjersti
Explainable artificial intelligence (XAI) and interpretable machine learning (IML) have become active research fields in recent years (Adadi and Berrada 2018; Molnar 2019). This is a natural consequence as complex machine learning (ML) models are now applied to solve supervised learning problems in many high-risk areas: cancer prognosis (Kourou et al. 2015), credit scoring (Kvamme et al. 2018), and money laundering detection (Jullum, Løland, et al. 2020). The high prediction accuracy of complex ML models often comes at the expense of model interpretability. As the goal of science is to gain knowledge from the collected data, the use of black-box models hinders the understanding of the underlying relationship between the features and the response, and thereby curtail scientific discovery. Model explanation frameworks from the XAI field extract the hidden knowledge about the underlying data structure captured by a black-box model, and thereby make the model's decision-making process transparent. This is crucial for, e.g., medical researchers that apply an ML model to obtain well-performing predictions, but who simultaneously also strive to discover important risk factors. Another driving factor is the Right to Explanation legislation in EU's General Data Protection Regulation (GDPR) (European Commission 2016).
- Europe > Austria > Vienna (0.14)
- Oceania > Australia > Tasmania (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- (5 more...)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Government > Regional Government > Europe Government (0.34)
MCCE: Monte Carlo sampling of realistic counterfactual explanations
Redelmeier, Annabelle, Jullum, Martin, Aas, Kjersti, Løland, Anders
In this paper we introduce MCCE: Monte Carlo sampling of realistic Counterfactual Explanations, a model-based method that generates counterfactual explanations by producing a set of feasible examples using conditional inference trees. Unlike algorithmic-based counterfactual methods that have to solve complex optimization problems or other model based methods that model the data distribution using heavy machine learning models, MCCE is made up of only two light-weight steps (generation and post-processing). MCCE is also straightforward for the end user to understand and implement, handles any type of predictive model and type of feature, takes into account actionability constraints when generating the counterfactual explanations, and generates as many counterfactual explanations as needed. In this paper we introduce MCCE and give a comprehensive list of performance metrics that can be used to compare counterfactual explanations. We also compare MCCE with a range of state-of-the-art methods and a new baseline method on benchmark data sets. MCCE outperforms all model-based methods and most algorithmic-based methods when also taking into account validity (i.e., a correctly changed prediction) and actionability constraints. Finally, we show that MCCE has the strength of performing almost as well when given just a small subset of the training data.
Explaining predictive models using Shapley values and non-parametric vine copulas
Aas, Kjersti, Nagler, Thomas, Jullum, Martin, Løland, Anders
The original development of Shapley values for prediction explanation relied on the assumption that the features being described were independent. If the features in reality are dependent this may lead to incorrect explanations. Hence, there have recently been attempts of appropriately modelling/estimating the dependence between the features. Although the proposed methods clearly outperform the traditional approach assuming independence, they have their weaknesses. In this paper we propose two new approaches for modelling the dependence between the features. Both approaches are based on vine copulas, which are flexible tools for modelling multivariate non-Gaussian distributions able to characterise a wide range of complex dependencies. The performance of the proposed methods is evaluated on simulated data sets and a real data set. The experiments demonstrate that the vine copula approaches give more accurate approximations to the true Shapley values than its competitors.
Most Popular Distance Metrics Used in KNN and When to Use Them - KDnuggets
KNN is the most commonly used and one of the simplest algorithms for finding patterns in classification and regression problems. It is an unsupervised algorithm and also known as lazy learning algorithm. It works by calculating the distance of 1 test observation from all the observation of the training dataset and then finding K nearest neighbors of it. This happens for each and every test observation and that is how it finds similarities in the data. For calculating distances KNN uses a distance metric from the list of available metrics.
Explaining predictive models with mixed features using Shapley values and conditional inference trees
Redelmeier, Annabelle, Jullum, Martin, Aas, Kjersti
It is becoming increasingly important to explain complex, black-box machine learning models. Although there is an expanding literature on this topic, Shapley values stand out as a sound method to explain predictions from any type of machine learning model. The original development of Shapley values for prediction explanation relied on the assumption that the features being described were independent. This methodology was then extended to explain dependent features with an underlying continuous distribution. In this paper, we propose a method to explain mixed (i.e. continuous, discrete, ordinal, and categorical) dependent features by modeling the dependence structure of the features using conditional inference trees. We demonstrate our proposed method against the current industry standards in various simulation studies and find that our method often outperforms the other approaches. Finally, we apply our method to a real financial data set used in the 2018 FICO Explainable Machine Learning Challenge and show how our explanations compare to the FICO challenge Recognition Award winning team.
- Information Technology > Modeling & Simulation (1.00)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.47)